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Abstract 

We present a new algorithm for behavioral targeting of banner adver- 
tisements. We record different user's actions such as clicks, search queries 
and page views. We use the collected information on the user to estimate 
in real time the probability of a click on a banner. A banner is displayed 
if it either has the highest probability of being clicked or if it is the one 
that generates the highest average profit. 

Keywords: web advertisement, behavioral targeting, association rule, 
data mining, click-through rate. 

1 Introduction 

The setting of our problem is the following: we are given a finite set of users, and 
a webserver. At each instant of time, a user u may connect to the webserver, 
requesting a webpage. The webserver responds to the request, and inserts into 
the webpage an appropriate banner containing an advertisement. The user may 
then click on the banner, or he may not. 

We take into account different events: impressions (visualizations), clicks, 
registrations, page views, keywords in a search queries, etc. . . An impression 
event occurs when the webserver responds to a user request for a given webpage, 
and inserts into the webpage a banner. A click event occurs when the user clicks 
on a banner. A registration is a voluntary action of the user after a click on the 
banner such as a purchase of the advertised item or the registration into a site 
and may have different levels depending on the profit /value of the action for 
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the advertiser. A page view is a simple view of a page. A keyword in a search 
query is the action of search for a specific word in a search engine embedded 
in a website. We refer to feature events as to the events that can be used to 
study the behavior of the users. We do not consider clicks and registrations 
as features because in our model we are assuming the independence of features 
(see equation @] in Section [3] for the technical details). In Section [5] we describe 
a heuristic improvement of our method that also considers clicks as features, 
similarly to methods used in collaborative filtering. A good choice could be to 
take as features all voluntary feature events (except impressions). 

Each impression, click or registration (purchase of a product, registration 
into the advertised site, etc. . . ) of a banner b can generate a profit. For sake of 
simplicity we will primarily consider profits generated by clicks and shortly give 
a description on how both impressions and registrations (see Section [5]) can be 
taken into consideration. 

Our goal is to maximize the profit generated by all clicks or maximize the 
number of clicks. The former goal is more general than the latter, because if 
we assume unitary value for all clicks, the total profit equals the total number 
of clicks. This problem has already been treated in the scientific literature (see 
[I] where the linear Poisson regression model is used to predict click-through 
rates and [5] where a metric is introduced to assess the quality of the behavioral 
targeting). Our proposed approach is simpler than the other similar approaches 
in the literature, in that it uses the naive Bayesian model. Other approaches 
are possible such as the ones based on linear programming models (p], [6], [2]). 

For a given user u, we store all events of the user in the cookie maintained by 
the browser. The cookie contains also the timestamps of each stored event. The 
user's cookie is used by the webserver each time the user requests a webpage, 
in order to select an appropriate banner. 

In this paper, we describe an algorithm that allows the webserver to select 
an appropriate banner, based on the information stored in the user's cookie. 

We refer to a feature as to the presence of a feature event in the user's 
cookie. We try to estimate the value of the association rule /!,...,/„—>■&, i.e., 
the probability that a user u clicks on 6, provided the u has features fx , . . . , f n . 

We use information on the features and the click-through rates. The web- 
server keeps track in real time of the click-through rates of all banners among 
users that have the same feature. 

Moreover we propose a heuristics to avoid ovcrflooding a user with the same 
banner based on the impressions in the user's cookie. 

The paper is organized as follows: in Section [5] we introduce some notation 
and definitions; the algorithm for selecting the banner to display is described in 
Section [31 the heuristics that limits the number of displays of the same banner 
is described in Section dj in the last sections different generalizations of the 
approach are considered (impressions and registrations in Section clicks as 
features in Section [51 generalizations in terms of locations and time are consid- 
ered in Section [7]). 
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2 Preliminaries 



We call "impression", the display of an advertisement. We will be using three 
data-structures to store information on user's clicks and impressions at real time: 
a user's history, which depends on the user; a click matrix and an impression 
matrix, which are global. 

In the following we denote with B = {bi, . . . , b n } the set of all banners and 
by T the set of all features to be taken into consideration. 

Definition 1 (User's history). For every user u we define their history as the 
set of triples (type, obj, time) that describe all events and timestamps of user 
u, where type is the type of event (impression, click, page view, search query, 
etc. . . ), obj is either the clicked banner, the URL of the viewed page or the 
keyword in search query, and where time is the timestamp of the event. 

Definition 2 (User's profile). For every user u we define their profile V u = 
{T u , S U ,C U ), where T u C T , S u : B — > N maps each banner to the number of its 
impressions to user u, and C u : B — > N maps each banner to the number of its 
clicks by user u. 

Remark 1. The user's history is the only data- structure that needs to be stored 
in the user's cookie. We have introduced the user's profile for the sake of sim- 
plicity. 

Definition 3. We denote by S — (Sij) the impression matrix, where Sij is 
the number of impressions of banner bj among users u that have feature i, i.e., 

i e T u . 

Definition 4. We denote by C — (cij) the click matrix, where Cij is the number 
of clicks of banner bj among users u who have feature i, i.e., ie J„. 

3 The banner selection algorithm 

Assume that a user u requests a webpage from the webserver, which responds 
by sending the webpage, and inserting into the webpage an appropriate banner. 
We now describe the general strategy on how the banner is selected, based on 
the information stored in the user's history of u. We will denote by P(b) the 
global probability for a banner b to be clicked and by P(b | fi,...,f n ) the 
probability for a banner b to be clicked by a user with features /i, . . . , /„. We 
are also assuming that P(b) ^ 0, P(b | /,) ^ 0, Mi. In particular we want to 
maximize the probability P(b | /i, . . . , /„). 

Given a set K of candidate banners that may be selected by the webserver, 
for each b € K we compute a score score{b). The banner with the highest score 
is then selected by the webserver. 

As score we take 

score(b) = cpc(b) ■ ru/e(/i, ...,/„-►&); (1) 

where 
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• cpc(b) is the "cost per click" (the profit) of 6; 

• rule(fi, ...,/„ —s> 6) is defined as follows: 

rule{h, ...,/„-►&) = P(6) f] ^7£p ( 2 ) 

Given two expressions a and /3 we use use the notation a cx /3 to mean 
a = c/3 where c does not depend on b. 
In particular we have the following fact 

Fact 1. Under the hypothesis that the features f$ are independent events we 
have 

rule(f 1 ,...,f n ^b)<xP(b\f 1 ,...,f ri ). (3) 

Proof. By applying Bayes' Theorem twice, under the simplifying hypothesis of 
independent features, we have 

P(b I fl, -.. , fn) = "/"I*) oc P(b)P(h, ...f n | b) 

r Uli • ■ • j Jri) 



P(b)l[P(f i \b) = P(b)l[ 



i=l 



P(b | f.il'if. 
Pip) 



(4) 



rule(fi, ...,/„ -> 6) • 



□ 



Therefore the banner with highest rule is the banner with the highest prob- 
ability of been clicked and the banner with highest score is the banner that 
generates the highest average profit per click. 

3.1 Click-through rates 

In order to compute the probabilities P(b), P(b | /,) for i = 1, . . . , b. in ((4]) we 
use the concept of click-through rate: 

#clicks on b 

ctr(b) — — ; — (among all users). 

# impressions of o . . 

^clicks on 6 

ctrftb) — — ; — - (among users with feature /). 

^impressions of b 

Therefore we can compute the probabilities P(b), P(b \ ft) as click frequen- 
cies. Hence we have 

P{b) = ctr(b); P(b\fi) = ctr fi (b). (6) 
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Thus we can write 



rule{f u ...,/„-►&) = ctr(b) ■ J[ = cbr{b?- n J] ctr fi (b). (7) 

i— 1 ^ ' i— 1 

3.2 Updating relative click-through rates 

In order to keep up to date the click-through rates for each feature we need to 
update the impression matrix and the click matrix in real time. A click-through 
rate of banner bj for a given feature i is then computed by counting the clicks 
and impressions in the i-th rows of the two matrices: 

ctn{bj) = Shi. (8 ) 

s i,j 

We consider the set B = {b\, . . . , b n } of all banners. In order to update the 
matrices after each impression and click, for every user, first, the user's history 
(and profile) are updated, and second, the following actions on the matrices are 
taken: 

• If there is an impression on banner bj by user u: then for every feature i of 

increased by one: for every i such that i e J, we do Sij := + 

• If there is a click on banner bj by a user u that has already clicked on bj, 
i.e. C u (bj) > 0: then for every feature i of u, Ci.j is increased by one, i.e. 
for every i such that i e T u we do Cij :— Ci.j + 1. 

• If there is a feature event i by a user u that did not have that feature: the 
i — th rows in C and 5* are updated with respectively the impressions and 
the clicks in the user's profile: Si j := Sj j +S u (bj), Ci j := Ci j +C u (bj), 
Vj. 



4 Avoiding user's boredom 

In this section we describe our strategy on how to avoid overflooding a user 
with the same banner. We achieve this by "throttling down" the value of the 
candidate banner taking into account the times at which the banner has already 
been displayed to the user. The value rule{f\, ...,/„—> b) is multiplied by a 
scaling factor throttle u (b) with the following properties: 

I. < throttle(b) < 1: in order to have a down-scaling. 

II. throttle(b) decreases with the number of impressions of b. 

III. throttle(b) decreases more if the impressions are more recent and it in- 
creases if the impressions are farther in the past. 
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So that wc have 



score (b) = cpc(b) rule{f\, ...,/„—>■ b) throttle(b) . 



In particular we choose the following function. Let t be the current instant of 
time. Also, let U, for i = 1, . . . , m be the instants of time in which an impression 
event for banner b and user u has occurred. 



where < a < 1 and h are heuristically selected parameters. 

Thus the formula in ([§]) can avoid overflooding the user with the same ban- 
ners and can improve the estimation of the probability of a click. Therefore we 
have the following facts 

• rule{f\, ...,/„ — > b) throttle(b) is an approximation of the probability of 
a click on banner & by a user who has features fi, ■ ■ ■ , f n and has already 
seen certain impressions of banner b. 

• score(b) is an approximation of the expected average profit solely gener- 
ated by a possible click after an impression of banner b. 

5 Impressions and Registrations 

In the most general case we may have banners that generate a profit for each 
impression, click and registration. 

For each candidate banner b, we can take into account profits generated by 
both impressions and clicks by considering: 



where imp_profit(b) is the profit generated by the display of b. 

This approach can be further generalized to encompass registrations of any 
step by simply treating them as click-like events. 

6 Click events as features 

Our approach only considers non-click events as features and assumes that the 
features are independent of each other. We can improve the accuracy of our 
predictions by considering click events but they would have to be treated dif- 
ferently because the basic assumption of independence does not hold for them. 
Clicks could be treated similarly to a purchase in a collaborative filtering ap- 
proach. We record each unique click in a user x banner matrix. The probability 
P(b | c) of a click on banner b by a user that has clicked on c will then depend 
on whether others users that have clicked on c have also clicked on b. For more 
details on an practical use of this approach we refer to |3] , where the approach 
is used for a recommending system. 





score + (b) = imp_profit(b) + score(b); 



(10) 
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7 Space and time 



This approach can be generalized in terms of time and space i.e., location of a 
banner. 

We can also take into account these significant attributes in order to better 
target the users at specific times for banners at specific locations. This can be 
achieved by simply recording this data in an extra dimension in the matrices S, 
C described in Section 13.21 

8 Future work 

This approach could be further developed, improved and generalized in different 
respects: with respect to how the features are treated (by considering clusters 
of features instead of single features or by considering non-Boolean features), 
and with respect to its applications. 

8.1 Non-Boolean features 

We can assign each feature a counter that could be used in an extended definition 
of value of rule(fi, ...,/„ —>•&). One possibility could be to consider 

rule*(fi,...,f n -> b) := W(cx,...,c n ) ■ rule(fi, . . . , /„ -> b), (11) 

where 

• Cj is the (possibly normalized with respect with the average) counter of 
feature for i = 1, . . . ,n; 

• W(ci, . . . , c„) is a measure of how the counters should correct the associ- 
ation rule fx, . . .,/„ b. 

A possible straightforward candidate for W(cx, ■ . ■ , c n ) could be the simple 
arithmetic average: 

W( Cl ,...,c n ):= Cl + '" + Cn . (12) 
n 

This could be used to differentiate between a single page view (or a single 
search query with a keyword) and multiple page views (multiple search queries 
with the same keyword). 

8.2 Application to on-line newspapers and magazines 

This approach could also be applied to on-line newspapers and magazines in 
that the visualization of an article's title is seen as an impression and a click on 
the title is seen as a click on a banner. 
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